Multi-Chain Prefetching: Exploiting Memory Parallelism in Pointer-Chasing Codes

نویسندگان

  • Nicholas Kohout
  • Seungryul Choi
  • Donald Yeung
چکیده

As the processor-memory performance gap continues to widen, application performance becomes increasingly limited by the memory system. Applications that employ linked data structures (LDSs) are particularly challenging from the standpoint of the memory system because of the memory serialization eeects associated with indirect memory addressing. Also known as the pointer chasing problem, such memory serialization eeects prevent latency tolerance techniques from overlapping the cache misses suuered along a chain of indirect memory references. Consequently, latency tolerance techniques are limited in their ability to tolerate the latency of cache misses arising from pointer chasing. While pointer-chasing is inherently sequential, pointer-chasing computations typically traverse multiple pointer chains in an independent fashion. Such independent pointer-chasing traversals represent a large source of memory parallelism. This paper explores the possibility of exploiting such memory par-allelism for the purposes of memory latency tolerance. First, we present a memory scheduling algorithm that computes a prefetch schedule from an LDS traversal speciication that overlaps serialized memory fetches across multiple pointer-chasing traversals. Second, we present a prefetch engine architecture capable of traversing LDSs. Our prefetch engine issues prefetches according to the prefetch schedule, thus exploiting the memory parallelism uncovered by our scheduling algorithm. Finally, we conduct an experimental evaluation of our prefetching technique on four pointer-chasing applications. Our results show multi-chain prefetching increases performance between 52% and 78%. However, early initiation of prefetches causes prefetch buuer thrashing, thus limiting further increases in performance. Additional simulations show that prefetch buuer thrashing can be eliminated in some cases by using a larger prefetch buuer and by reducing the conservative prefetch distances computed by our scheduling algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Chain Prefetching: Exploiting Natural Memory Parallelism in Pointer-Chasing Codes

This paper presents a novel pointer prefetching technique, called multi-chain prefetching. Multi-chain prefetching tolerates serialized memory latency commonly found in pointer chasing codes via aggressive prefetch scheduling. Unlike conventional prefetching techniques that hide memory latency underneath a single traversal loop or recursive function exclusively, multi-chain prefetching initiate...

متن کامل

The Efficacy of Software Prefetching and Locality Optimizations on Future Memory Systems

Software prefetching and locality optimizations are techniques for overcoming the speed gap between processor and memory. In this paper, we provide a comprehensive summary of current software prefetching and locality optimization techniques, and evaluate the impact of memory trends on the effectiveness of these techniques for three types of applications: regular scientific codes, irregular scie...

متن کامل

Asynchronous Memory Access Chaining

In-memory databases rely on pointer-intensive data structures to quickly locate data in memory. A single lookup operation in such data structures often exhibits long-latency memory stalls due to dependent pointer dereferences. Hiding the memory latency by launching additional memory accesses for other lookups is an effective way of improving performance of pointer-chasing codes (e.g., hash tabl...

متن کامل

Storage Efficient Hardware Prefetching using Delta-Correlating Prediction Tables

This paper presents a novel prefetching heuristic called Delta Correlating Prediction Tables (DCPT). DCPT builds upon two previously proposed techniques, RPT prefetching by Chen and Baer and PC/DC prefetching by Nesbit and Smith. It combines the storageefficient table based design of Reference Prediction Tables (RPT) with the high performance delta correlating design of PC/DC. DCPT substantiall...

متن کامل

A Programmable Memory Hierarchy for Prefetching Linked Data Structures

Prefetching is often used to overlap memory latency with computation for array-based applications. However, prefetching for pointerintensive applications remains a challenge because of the irregular memory access pattern and pointer-chasing problem. In this paper, we use a programmable processor, a prefetch engine (PFE), at each level of the memory hierarchy to cooperatively execute instruction...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000